首页> 外文OA文献 >Verification of Markov Decision Processes using Learning Algorithms
【2h】

Verification of Markov Decision Processes using Learning Algorithms

机译:利用学习算法验证马尔可夫决策过程

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We present a general framework for applying machine-learning algorithms tothe verification of Markov decision processes (MDPs). The primary goal of thesetechniques is to improve performance by avoiding an exhaustive exploration ofthe state space. Our framework focuses on probabilistic reachability, which isa core property for verification, and is illustrated through two distinctinstantiations. The first assumes that full knowledge of the MDP is available,and performs a heuristic-driven partial exploration of the model, yieldingprecise lower and upper bounds on the required probability. The second tacklesthe case where we may only sample the MDP, and yields probabilistic guarantees,again in terms of both the lower and upper bounds, which provides efficientstopping criteria for the approximation. The latter is the first extension ofstatistical model-checking for unbounded properties in MDPs. In contrast withother related approaches, we do not restrict our attention to time-bounded(finite-horizon) or discounted properties, nor assume any particular propertiesof the MDP. We also show how our techniques extend to LTL objectives. Wepresent experimental results showing the performance of our framework onseveral examples.
机译:我们提出了一种将机器学习算法应用于马尔可夫决策过程(MDP)验证的通用框架。这些技术的主要目标是通过避免详尽探索状态空间来提高性能。我们的框架专注于概率可达性,这是验证的核心属性,并通过两个不同的实例进行说明。第一个假设假定MDP的全部知识可用,并对模型进行启发式驱动的部分探索,从而得出所需概率的精确上下限。第二种方法解决了仅对MDP进行抽样的情况,并根据上下限给出了概率保证,这为近似提供了有效的停止标准。后者是MDP中无边界属性统计模型检查的第一个扩展。与其他相关方法相反,我们不将注意力集中在有时间限制的(有限水平)或折价属性上,也不假定MDP的任何特定属性。我们还将展示我们的技术如何扩展到LTL目标。我们通过几个示例展示实验结果,这些结果显示了我们框架的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号